Pattern Matching in Degenerate DNA/RNA Sequences

نویسندگان

  • Mohammad Sohel Rahman
  • Costas S. Iliopoulos
  • Laurent Mouchard
چکیده

In this paper, we consider the pattern matching problem in DNA and RNA sequences where either the pattern or the text can be degenerate i.e. contain sets of characters. We present an asymptotically faster algorithm for the above problem that works in O(n logm) time, where n and m is the length of the text and the pattern respectively. We also suggest an efficient implementation of our algorithm, which works in linear time when the pattern size is small. Finally, we also describe how our approach can be used to solve the distributed pattern matching problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach to Pattern Matching in Degenerate DNA/RNA Sequences and Distributed Pattern Matching

In this paper, we consider the pattern matching problem in DNA and RNA sequences where either the pattern or the text can be degenerate i.e. contain sets of characters. We present an asymptotically faster algorithm for the above problem that works in O(n logm) time, where n and m is the length of the text and the pattern respectively. We also suggest an efficient implementation of our algorithm...

متن کامل

Parallel Algorithms for Degenerate and Weighted Sequences Derived from High Throughput Sequencing Technologies

Novel high throughput sequencing technologies have redefined the way genome sequencing is performed. They are able to produce millions of short sequences in a single experiment and with a much lower cost than previous methods. In this paper, we address the problem of efficiently mapping and classifying millions of degenerate and weighted sequences to a reference genome, based on whether they oc...

متن کامل

Indexing Factors in DNA/RNA Sequences

In this paper, we present the Truncated Generalized Suffix Automaton (TGSA) and present an efficient on-line algorithm for its construction. TGSA is a novel type of finite automaton suitable for indexing DNA and RNA sequences, where the text is degenerate i.e. contains sets of characters. TGSA indexes the so called k-factors, the factors of the degenerate text with length not exceeding a given ...

متن کامل

Designing Of Degenerate Primers-Based Polymerase Chain Reaction (PCR) For Amplification Of WD40 Repeat-Containing Proteins Using Local Allignment Search Method

Degenerate primers-based polymerase chain reaction (PCR) are commonly used for isolation of unidentified gene sequences in related organisms. For designing the degenerate primers, we propose the use of local alignment search method for searching the conserved regions long enough to design an acceptable primer pair. To test this method, a WD40 repeat-containing domain protein from Beauveria bass...

متن کامل

Prediction and pattern matching algorithms for RNA multi-structures

RNA (ribonucleic acid) molecules have various functions in cells. Just as they can store anddeliver the DNA message for the protein synthesis (messenger RNAs), they can also directly cat-alyze chemical reactions or act as a regulator (functional RNAs, also called non-coding RNAs).Nowadays, recent sequencing technologies yield billions of genomic sequences – DNA, RNA– at a very s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007